rmse score
Enhancing Epidemic Forecasting: Evaluating the Role of Mobility Data and Graph Convolutional Networks
Guo, Suhan, Xu, Zhenghao, Shen, Furao, Zhao, Jian
Accurate prediction of contagious disease outbreaks is vital for informed decision-making. Our study addresses the gap between machine learning algorithms and their epidemiological applications, noting that methods optimal for benchmark datasets often underperform with real-world data due to difficulties in incorporating mobility information. We adopt a two-phase approach: first, assessing the significance of mobility data through a pilot study, then evaluating the impact of Graph Convolutional Networks (GCNs) on a transformer backbone. Our findings reveal that while mobility data and GCN modules do not significantly enhance forecasting performance, the inclusion of mortality and hospitalization data markedly improves model accuracy. Additionally, a comparative analysis between GCN-derived spatial maps and lockdown orders suggests a notable correlation, highlighting the potential of spatial maps as sensitive indicators for mobility. Our research offers a novel perspective on mobility representation in predictive modeling for contagious diseases, empowering decision-makers to better prepare for future outbreaks.
Multi-modal cascade feature transfer for polymer property prediction
Obuchi, Kiichi, Yahagi, Yuta, Toyama, Kiyohiko, Tanaka, Shukichi, Matsui, Kota
In this paper, we propose a novel transfer learning approach called multi-modal cascade model with feature transfer for polymer property prediction.Polymers are characterized by a composite of data in several different formats, including molecular descriptors and additive information as well as chemical structures. However, in conventional approaches, prediction models were often constructed using each type of data separately. Our model enables more accurate prediction of physical properties for polymers by combining features extracted from the chemical structure by graph convolutional neural networks (GCN) with features such as molecular descriptors and additive information. The predictive performance of the proposed method is empirically evaluated using several polymer datasets. We report that the proposed method shows high predictive performance compared to the baseline conventional approach using a single feature.
Exploring Design Choices for Autoregressive Deep Learning Climate Models
Gallusser, Florian, Hentschel, Simon, Krause, Anna, Hotho, Andreas
Published as a workshop paper at "Tackling Climate Change with Machine Learning", ICLR 2025 Deep Learning (DL) models have achieved state-of-the-art performance in medium-range weather prediction (MWP) but often fail to maintain physically consistent rollouts beyond 14 days. In contrast, a few atmospheric models demonstrate stability over decades, though the key design choices enabling this remain unclear. This study quantitatively compares the long-term stability of three prominent DL-MWP architectures -- FourCastNet, SFNO, and ClimaX -- trained on ERA5 reanalysis data at 5. 625 We systematically assess the impact of autoregressive training steps, model capacity, and choice of prognostic variables, identifying configurations that enable stable 10-year rollouts while preserving the statistical properties of the reference dataset. Notably, rollouts with SFNO exhibit the greatest robustness to hyperparameter choices, yet all models can experience instability depending on the random seed and the set of prognostic variables. Over the past few years autoregressive Deep Learning ( DL) models have emerged that are en par with physics-based state-of-the-art medium range weather prediction systems while only requiring a fraction of the computational costs for inference (Lam et al., 2023; Bi et al., 2023; Price et al., 2025).
Before It's Too Late: A State Space Model for the Early Prediction of Misinformation and Disinformation Engagement
Tian, Lin, Booth, Emily, Bailo, Francesco, Droogan, Julian, Rizoiu, Marian-Andrei
In today's digital age, conspiracies and information campaigns can emerge rapidly and erode social and democratic cohesion. While recent deep learning approaches have made progress in modeling engagement through language and propagation models, they struggle with irregularly sampled data and early trajectory assessment. We present IC-Mamba, a novel state space model that forecasts social media engagement by modeling interval-censored data with integrated temporal embeddings. Our model excels at predicting engagement patterns within the crucial first 15-30 minutes of posting (RMSE 0.118-0.143), enabling rapid assessment of content reach. By incorporating interval-censored modeling into the state space framework, IC-Mamba captures fine-grained temporal dynamics of engagement growth, achieving a 4.72% improvement over state-of-the-art across multiple engagement metrics (likes, shares, comments, and emojis). Our experiments demonstrate IC-Mamba's effectiveness in forecasting both post-level dynamics and broader narrative patterns (F1 0.508-0.751 for narrative-level predictions). The model maintains strong predictive performance across extended time horizons, successfully forecasting opinion-level engagement up to 28 days ahead using observation windows of 3-10 days. These capabilities enable earlier identification of potentially problematic content, providing crucial lead time for designing and implementing countermeasures. Code is available at: https://github.com/ltian678/ic-mamba. An interactive dashboard demonstrating our results is available at: https://ic-mamba.behavioral-ds.science.
ArchesWeather: An efficient AI weather forecasting model at 1.5{\deg} resolution
Couairon, Guillaume, Lessig, Christian, Charantonis, Anastase, Monteleoni, Claire
One of the guiding principles for designing AI-based weather forecasting systems is to embed physical constraints as inductive priors in the neural network architecture. A popular prior is locality, where the atmospheric data is processed with local neural interactions, like 3D convolutions or 3D local attention windows as in Pangu-Weather. On the other hand, some works have shown great success in weather forecasting without this locality principle, at the cost of a much higher parameter count. In this paper, we show that the 3D local processing in Pangu-Weather is computationally sub-optimal. We design ArchesWeather, a transformer model that combines 2D attention with a column-wise attention-based feature interaction module, and demonstrate that this design improves forecasting skill. ArchesWeather is trained at 1.5{\deg} resolution and 24h lead time, with a training budget of a few GPU-days and a lower inference cost than competing methods. An ensemble of four of our models shows better RMSE scores than the IFS HRES and is competitive with the 1.4{\deg} 50-members NeuralGCM ensemble for one to three days ahead forecasting. Our code and models are publicly available at https://github.com/gcouairon/ArchesWeather.
Function Extrapolation with Neural Networks and Its Application for Manifolds
This paper addresses the problem of accurately estimating a function on one domain when only its discrete samples are available on another domain. To answer this challenge, we utilize a neural network, which we train to incorporate prior knowledge of the function. In addition, by carefully analyzing the problem, we obtain a bound on the error over the extrapolation domain and define a condition number for this problem that quantifies the level of difficulty of the setup. Compared to other machine learning methods that provide time series prediction, such as transformers, our approach is suitable for setups where the interpolation and extrapolation regions are general subdomains and, in particular, manifolds. In addition, our construction leads to an improved loss function that helps us boost the accuracy and robustness of our neural network. We conduct comprehensive numerical tests and comparisons of our extrapolation versus standard methods. The results illustrate the effectiveness of our approach in various scenarios.
Symbolic Regression as Feature Engineering Method for Machine and Deep Learning Regression Tasks
Shmuel, Assaf, Glickman, Oren, Lazebnik, Teddy
In the realm of machine and deep learning regression tasks, the role of effective feature engineering (FE) is pivotal in enhancing model performance. Traditional approaches of FE often rely on domain expertise to manually design features for machine learning models. In the context of deep learning models, the FE is embedded in the neural network's architecture, making it hard for interpretation. In this study, we propose to integrate symbolic regression (SR) as an FE process before a machine learning model to improve its performance. We show, through extensive experimentation on synthetic and real-world physics-related datasets, that the incorporation of SR-derived features significantly enhances the predictive capabilities of both machine and deep learning regression models with 34-86% root mean square error (RMSE) improvement in synthetic datasets and 4-11.5% improvement in real-world datasets. In addition, as a realistic use-case, we show the proposed method improves the machine learning performance in predicting superconducting critical temperatures based on Eliashberg theory by more than 20% in terms of RMSE. These results outline the potential of SR as an FE component in data-driven models.
Predicting Zip Code-Level Vaccine Hesitancy in US Metropolitan Areas Using Machine Learning Models on Public Tweets
Melotte, Sara, Kejriwal, Mayank
Although the recent rise and uptake of COVID-19 vaccines in the United States has been encouraging, there continues to be significant vaccine hesitancy in various geographic and demographic clusters of the adult population. Surveys, such as the one conducted by Gallup over the past year, can be useful in determining vaccine hesitancy, but can be expensive to conduct and do not provide real-time data. At the same time, the advent of social media suggests that it may be possible to get vaccine hesitancy signals at an aggregate level (such as at the level of zip codes) by using machine learning models and socioeconomic (and other) features from publicly available sources. It is an open question at present whether such an endeavor is feasible, and how it compares to baselines that only use constant priors. To our knowledge, a proper methodology and evaluation results using real data has also not been presented. In this article, we present such a methodology and experimental study, using publicly available Twitter data collected over the last year. Our goal is not to devise novel machine learning algorithms, but to evaluate existing and established models in a comparative framework. We show that the best models significantly outperform constant priors, and can be set up using open-source tools.
The Solution Approach Of The Great Indian Hiring Hackathon: Winners' Take
MachineHack has successfully concluded The Great Indian Hiring Hackathon on 23rd of November 2020, where it collaborated for the first time with 12 companies to help data science professionals land up in a rewarding career. In this hackathon, the MachineHack community was asked to come up with an algorithm to predict the price of retail items belonging to different categories. In participation with companies like -- Aditya Birla Group, Bridgei2i, Concentrix, Fractal, Genpact, Lowe's, MiQ, Piramal, Scienaptic, Vmware, WellsFargo, and Zycus, the hackathon has witnessed an active attendance of whooping 5655 practitioners. Foretelling the retail price can be a daunting task due to the huge datasets with a variety of attributes ranging from text, numbers (floats, integers), as well as date and time. Also, outliers can be a big problem when dealing with unit prices. Thus this hackathon asked the participants to come out with a solution to forecast retail prices of items of different categories.
Addressing Item-Cold Start Problem in Recommendation Systems using Model Based Approach and Deep Learning
Obadić, Ivica, Madjarov, Gjorgji, Dimitrovski, Ivica, Gjorgjevikj, Dejan
Traditional recommendation systems rely on past usage data in order to generate new recommendations. Those approaches fail to generate sensible recommendations for new users and items into the system due to missing information about their past interactions. In this paper, we propose a solution for successfully addressing item-cold start problem which uses model-based approach and recent advances in deep learning. In particular, we use latent factor model for recommendation, and predict the latent factors from item's descriptions using convolutional neural network when they cannot be obtained from usage data. Latent factors obtained by applying matrix factorization to the available usage data are used as ground truth to train the convolutional neural network. To create latent factor representations for the new items, the convolutional neural network uses their textual description. The results from the experiments reveal that the proposed approach significantly outperforms several baseline estimators.